Single-cell transcriptomic analysis of vascular endothelial cells in zebrafish embryos

Vascular endothelial cells exhibit substantial phenotypic and transcriptional heterogeneity which is established during early embryogenesis. However, the molecular mechanisms involved in establishing endothelial cell diversity are still not well understood. Zebrafish has emerged as an advantageous model to study vascular development. Despite its importance, the single-cell transcriptomic profile of vascular endothelial cells during zebrafish development is still missing. To address this, we applied single-cell RNA-sequencing (scRNA-seq) of vascular endothelial cells isolated from zebrafish embryos at the 24 hpf stage. Six distinct clusters or subclusters related to vascular endothelial cells were identified which include arterial, two venous, cranial, endocardial and endothelial progenitor cell subtypes. Furthermore, we validated our findings by characterizing novel markers for arterial, venous, and endocardial cells. We experimentally confirmed the presence of two transcriptionally different venous cell subtypes, demonstrating heterogeneity among venous endothelial cells at this early developmental stage. This dataset will be a valuable resource for future functional characterization of vascular endothelial cells and interrogation of molecular mechanisms involved in the establishment of their heterogeneity and cell-fate decisions.

. We have previously generated etv2 ci32Gt gene trap line which has an insertion of the Gal4 transcriptional activator within the etv2 genomic locus. Furthermore, etv2 ci32Gt+/− ; UAS:GFP embryos are phenotypically normal and faithfully recapitulate GFP expression in vascular endothelial, myeloid and red blood cells 18 .
Here, we performed scRNA-seq using a combination of Tg(kdrl:mCherry) and etv2 ci32Gt ;UAS:GFP reporter lines to generate transcriptomic profile of vascular endothelial cells isolated from zebrafish embryos at 24 hpf (hours post fertilization) stage, when blood circulation is first initiated. We identified transcriptional profile of 6 different subtypes of vascular endothelial cells, which include vascular progenitor, arterial, two different venous, endocardial, and mixed arteriovenous subtypes. We validated our findings by characterizing novel markers for arterial, venous, and endocardial cells. We experimentally confirmed the presence of two transcriptionally different venous cell subtypes, demonstrating unexpected heterogeneity among venous endothelial cells at this early developmental stage. This data provides a valuable resource for future functional characterization of vascular endothelial cells and interrogation of molecular mechanisms involved in the establishment of their heterogeneity.

Results
To obtain a single-cell transcriptional profile of vascular endothelial cells, we crossed the previously established Tg(kdrl:mCherry) and etv2 ci32Gt ; UAS:GFP lines 14,18 . GFP and mCherry positive embryos were collected at 24 hpf. Because the etv2 ci32Gt ; UAS:GFP line labels vascular endothelial and some other cell types, such as blood, we decided to collect two different cell populations to help with subsequent annotation and line characterization. Thus, GFP + mCherry + and GFP + mCherry-cell populations were isolated by FACS (fluorescent activated cell sorting) and subjected to single-cell RNA-seq analysis using the Chromium Controller (10 × Genomics) followed by the next-generation sequencing (Fig. 1A). After filtering, transcriptomes of 5275 and 5543 cells were obtained from GFP + mCherry + and GFP + mCherry-cell populations, respectively. Transcriptomes from both cell populations were pooled for further analysis. Unbiased graph-based clustering was performed using the Partek flow analysis package, and 17 distinct clusters were identified and visualized using Uniform Manifold Approximation and Projection (UMAP) approach (Fig. 1B,C). We then assigned cell identities to these clusters based on previously known marker genes which were significantly enriched in each cluster and showed specific gene expression patterns ( Fig. 1B-F, Fig. S1 and Tables S1, S2). This revealed major cellular subtypes including five vascular endothelial, three red blood cell (RBC), sclerotome, two neural, macrophage, endothelial progenitor cell (EPC), epidermis, neutrophil, and myocytes. Vascular endothelial cells were mostly GFP + mCherry +, while RBC and macrophage clusters were largely GFP + mCherry− (Fig. 1E). This is consistent with known etv2 reporter expression in both endothelial and blood cells 18 , while kdrl:mCherry expression is largely restricted to vascular endothelial cells 14 .
Five distinct clusters related to vascular endothelial cells including vascular progenitor, arterial, two different venous, and mixed arteriovenous subtypes were obtained, underscoring the heterogeneity of vascular endothelial cells. We discuss these clusters in greater detail below.
Identification of an arterial cluster. Top marker genes for cluster #1 included dll4, dlc, and flt1, which are known to be expressed in the arterial cells [19][20][21] (Fig. 2A-E and Table S1). Many other known arterial markers, including vegfc, aqp1a.1 and hey2, were also enriched in cluster #1 20,22,23 (Fig. 2E and Table S1). In addition, cluster #1 also showed increased expression of unknown or previously uncharacterized genes. We hypothesized that many of these genes would be also expressed in arterial cells. Indeed, in situ hybridization (ISH) analysis of two selected previously uncharacterized genes, esm1 and notchl, demonstrated their expression in the dorsal aorta and intersegmental vessels (ISV) (Fig. 2G,H and Table S1). esm1 expression appeared enriched in the sprouting ISVs and was reduced in non-sprouting segments of the DA, suggesting further heterogeneity within the arterial population. In an attempt to gain insights into the biological processes represented by enriched genes in cluster #1, we performed Gene ontology (GO) and Pathway enrichment analysis. Top GO biological processes represented by gene set in cluster #1 included dorsal aorta morphogenesis, aorta morphogenesis, and dorsal aorta development ( Fig. 2F and Table S3). Furthermore, pathway analysis revealed enrichment of Hedgehog signaling pathway, known to play an important role in arterial development 24 , including the regulation of size of the dorsal aorta 25 (Table S4). Other enriched pathways included histamine H1 receptor signaling pathway, which is known to regulate arterial vasodilation 26 , as well as PI3 kinase, integrin signaling and EGF receptor signaling pathways (Table S4), which have been implicated in several steps of vascular development including angiogenesis 27 , vasculogenesis 28 and arterial development 29,30 . Identification of two subtypes of venous cells. Top marker genes for cluster #2 included lyve1b and sele, both known to be enriched in venous cells 31,32 (Fig. 3A,C and Table S1). Many other known venous marker genes, such as flt4, stab1, mrc1a and stab2, were also enriched in cluster #2 [33][34][35][36] (Fig. 3E and Table S1). In addition to known venous markers, cluster #2 included many genes with unknown or previously uncharacterized expression pattern, such as si:dkey-28n18.9 (Fig. 3B,D,E and Table S1). We hypothesized that many of these uncharacterized genes would also be expressed in venous cells. We selected two previously uncharacterized genes, bcl6b and si:dkey-28n18. 9, and analyzed their expression by ISH. Consistent with our hypothesis, we observed expression of bcl6b and si:dkey-28n18.9 in the posterior cardinal vein (PCV) (Fig. 3G,H). Top GO biological processes included endothelial tube morphogenesis, blood vessel lumenization and morphogenesis of an endothelium as well as PI3 signaling and lymphangiogenesis ( Fig. 3F and Table S3). Highly represented pathways included PI3 kinase, Insulin/IGF, Integrin and Ras signaling pathways (Table S4). PI3 kinase pathway has been previously implicated in arterial-venous specification 30 , while Insulin/IGF pathway has established roles in angiogenesis 37 .
Other top pathways such as Integrin signalling have been shown to play a crucial role in vasculogenesis 28      www.nature.com/scientificreports/ Ras signaling pathway has also been implicated to regulate arterial-venous specification in zebrafish 38 . In addition, VEGF, FGF and Notch signaling pathways, which have established roles in vascular development and arteriovenous differentiation, were also among top pathways identified by the gene list in cluster #2 39,40 (Table S4). Interestingly, we also identified a second cluster, cluster #7, which showed enriched expression of known venous marker genes. Top marker genes for cluster #7 included glula, known to be enriched in venous cells 41 (Fig. 4A,C,E and Table S1). Other known venous markers, including mrc1a, stab2 and dab2, were also enriched in cluster #7 35,36,42 (Fig. 4E and Table S1). In addition to the known venous markers, top marker genes from this cluster also included many previously uncharacterized/unknown genes, such as cx30. 3 and otc (Fig. 4B,D,E and Table S1). We performed ISH to confirm if these genes were expressed in venous cells in zebrafish. Indeed, both cx30. 3 and otc demonstrated expression in the PCV and the caudal vein (Fig. 4G,H). cx30.3 expression was enriched in the caudal vein, and only weak expression in the PCV was apparent. In addition, cx30.3 showed strong expression in the cranial and trunk neural crest cells, while otc showed strong expression in the lens. Highly represented GO biological processes for cluster #7 included terms such as apical protein localization, endothelial tube morphogenesis, blood vessel lumenization, and morphogenesis of an endothelium ( Fig. 4F and Table S3). Enriched pathways included Insulin/IGF, Hypoxia-induced factor (HIF), PI3 kinase, and Hedgehog signaling pathways (Table S4), which have been all implicated in several steps of vascular development, including angiogenesis 37,43 , arteriovenous specification 30 , and establishing vascular integrity 44 .
We next asked if there was a significant biological difference between the two separate venous clusters and if they represented two distinct subsets of venous cells. To examine this, we performed fluorescent in situ hybridization for selected top markers of the two venous clusters, lyve1b (cluster #2) and glula (cluster #7), using Tg(kdrl:GFP) embryos at 24 hpf stage. There was a significant overlap between lyve1b and glula expression within the PCV in the trunk region. Intriguingly, the caudal vein (CV) displayed glula expression, while lyve1b expression was nearly absent (Fig. 4I-K). In contrast, a subset of endothelial cells within the PCV displayed strong lyve1b expression and showed very little glula expression ( Fig. 4I-K), suggesting heterogeneity in the venous cell population. These results suggest that cluster #7 is enriched in the caudal vein cells. tfec, a previously characterized marker for caudal endothelial cells 45 , was preferentially enriched in the cluster #7 (Table S1), supporting its annotation as the caudal vein.
We have previously reported transcriptomic profiles of arterial and venous endothelial cells identified by scRNA-seq analaysis of zebrafish trunks at 30 hpf 46 . A substantial overlap was observed between 24 and 30 hpf datasets in both arterial and venous marker expression (Table S5). While there was only a single venous specific cluster identified at 30 hpf, its top marker genes were shared between 24 hpf Venous-1 and Venous-2 clusters, reflecting a substantial overlap between the two venous clusters and suggesting that many arterial and venous markers identified at 24 hpf continue their specific expression through later stages.

Identification of vascular endothelial cluster co-expressing both arterial and venous markers.
Top marker genes for cluster #6 included cldn5b, known to be enriched in arterial vascular endothelial cells 47 (Fig. 5A,C). In addition, known venous markers dab2 and lyve1b as well as pan-endothelial marker kdrl were also enriched in cluster #6 31,33 (Fig. 5B,D,E and Table S1). Other top markers for cluster #6 included rab11bb, cox4i2, fank1, and krt18 (krt18 had the lowest p value). Expression of these markers either has not been previously characterized or they are known to be expressed in many different cell types, including the vascular endothelium (Table S1). To identify which cells in a zebrafish embryo co-express arterial and venous markers, we performed fluorescent in situ for the arterial marker cldn5b and the venous marker dab2 in Tg(kdrl:GFP) embryos. Intriguingly, cldn5b and dab2 showed co-expression within a subset of cranial vasculature including venous primordial hindbrain channels (PHBCs) and mid cerebral vein (MCeV) at 24 and 30 hpf (Fig. 6A-F). In contrast, consistent with previous studies, expression of cldn5b and dab2 was restricted to the DA and PCV, respectively, in the trunk vasculature and did not overlap 48 (Fig. 6G-I). Other cluster #6 markers cox4i2 and krt18 also showed expression in PHBCs and MCeV (Fig. 6J,K). These results suggest that cluster #6 corresponds to a subset of cranial vasculature, which shows a distinct molecular identity from other types of vascular endothelial cells.
Highly represented GO biological processes for cluster #6 included vascular endothelial growth factor receptor, Arp2/3 complex-mediated actin nucleation, regulation of vascular permeability and endothelium development ( Fig. 5F and Table S3). Enriched pathways included histamine H2 receptor signaling pathway, implicated in vasodilation in brain capillaries 49 , cytoskeletal regulation by Rho GTPase, implicated in vascular integrity and tubulogenesis 50,51 , and angiotensin II-stimulated signaling through G proteins and beta-arrestin, known to regulate vasoconstriction and blood pressure 52 (Table S4).

Identification of endothelial progenitor cells (EPC). Top marker genes for cluster #14 included npas4l
and lmo2, known to be enriched in endothelial progenitor cells 33,53 (Fig. 7A,C and Table S1). Other known EPC markers, including tal1/scl and etv2, were also enriched in cluster #14 15,54 (Fig. 7B,D,E and Table S1). Highly represented GO biological processes included actin filament-based transport, endothelial cell migration, vasculogenesis, non-canonical Wnt signaling pathway, endothelial cell differentiation, and artery development ( Fig. 7F and Table S3). Interestingly, GO biological processes pertaining to lymph vessel development were also enriched. Enriched pathways included PDGF, integrin, TGF-beta and Wnt signaling pathways (Table S4). PDGF signal-  www.nature.com/scientificreports/ ing pathway has been shown to induce proliferation, migration, and angiogenesis of EPCs 55 . Similarly, Integrin signaling has been shown to be a major regulator of EPC mobilization, homing, invasion and differentiation 56 . TGF-beta and Wnt signaling pathways have been implicated in differentiation and specification of EPCs 57,58 .
To identify developmental relationship between all 5 vascular endothelial clusters, we performed developmental trajectory analysis using Monocle 3 59 (Fig. 7G). The trajectory analysis suggested that EPC cells transition into 3 distinct branches: an arterial and 2 venous. The two venous branches merged again, perhaps reflecting a significant overlap between the two cell populations. Venous-1 population transitioned into VE-other, which, as our results suggest, corresponds to a subset of cranial vasculature. Some VE-other cells were positioned in the vicinity of arterial population, reflecting mixed arteriovenous identity of VE-other cluster. endothelial clusters, some other known endothelial cell subtypes, including endocardial cells, lymphatic progenitors, hemogenic endothelium, tip and stalk cells, were not identified using this approach. In an attempt to define additional vascular endothelial cell types, we performed subclustering using vascular endothelial cells only. However, such approach did not identify any additional meaningful clusters beyond what we have already demonstrated. Therefore, we focused on the endocardial cells and attempted to identify their transcriptional profile using the targeted clustering approach based on expression of known marker genes. Endocardial cells are known to share a large transcriptional similarity with other vascular endothelial cells 33,60-63 , but they also have their unique identity known by expression of marker genes such as nfatc1 and gata5 64,65 . Recent studies have described endocardial transcriptome at 15 and 20-somite stages in zebrafish embryos 48,66 . However, endocardial transcriptome at 24 hpf stage has not been previously reported. Therefore, to identify the transcriptomic signature of endocardial cells, we subclustered endothelial cells based on the expression of nfatc1 and gata5 (Fig. 8A-C). This resulted in the identification of a single subcluster comprised of 21 cells. Top marker genes for this subpopulation, besides gata5, included has2 and fn1a, all known to have endocardial specific expression 64,65,67-69 ( Fig. 8D and Table S6). Known zebrafish endocardial markers klf2a and notch1b were also enriched in this cluster 70,71 (Table S6). Heart-specific expression has been reported for many other genes in this cluster including id2b, tbx20, spock3, ece1 and smad6a, although it was not investigated whether this expression was endocardial or myocardial [72][73][74][75] . We previously reported the transcriptome of an endocardial cluster identified during scRNAseq analysis of etv2 ci32Gt ; UAS:GFP embryos at the 20-somite stage (19 hpf) 48 . There was a significant overlap of marker genes, such as fn1a, gata5, id2b, wnt11r, bmp16 and bambib between the two endocardial clusters (Table S7). A recent study reported the zebrafish endocardial transcriptome at the 15-somite stage 66 . There was an extensive overlap between our endocardial dataset and the one reported by Capon et al. (Table S7), further validating endocardial identity of the nfatc1 + gata5 + cell cluster. As an additional confirmation, we analyzed expression of 3 selected markers wnt11r, bmp16 and bambib, endocardial expression of which has not been previously reported. In situ hybridization confirmed the expression of wnt11r, bambib and bmp16 in the heart at 24 hpf (Fig. 8E-G). To gain further insights into the signature of the endocardial cells relative to endothelial cells, we used Monocle 3 to analyze the position of endocardial cells in the trajectory plot. Majority of the endocardial gata5 + nfatc1 + cells clustered within or next to the arterial cluster. However, a few endocardial cells were also present within the venous cluster ( Figure S2), suggesting a unique signature and heterogeneity between endocardial cells at 24 hpf. Altogether, we have identified the transcriptomic signature of zebrafish endocardial cells and validated new zebrafish endocardium markers.

Discussion
In this study we used scRNA-seq analysis to identify transcriptomic signatures of vascular endothelial cells in zebrafish embryos at 24 hpf, at the onset of blood circulation. Our profiling of 10,818 cells identified 6 different subtypes of vascular endothelial cells which included vascular progenitor, arterial, two different venous, endocardial, and cranial endothelial subtypes. Importantly, we have validated new markers for arterial, two different venous, and endocardial populations. Intriguingly, two different venous populations were identified during scRNA-seq analysis. The PCV is known to harbor a heterogenous cell population including progenitors of lymphatic and subintestinal vasculature [76][77][78] . Many markers including lyve1b overlap between venous and lymphatic cells during development 79 making it difficult to distinguish between the two lineages. Prox1a and prox1b are considered to be the definitive markers of lymphatic lineage [80][81][82][83] . Both markers were similarly enriched in the venous-1 (cluster #2, log 2 values 2.85 and 3.96, respectively) and venous-2 (cluster #7, log 2 values 2.97 and 3.46, respectively, Table S1) clusters. Therefore, it is unlikely that either of the clusters corresponds to lymphatic progenitors. Only a small number of cells were positive for either prox1a or prox1b, and they did not cluster together, preventing us from identifying lymphatic transcriptome at this developmental stage. While subintestinal progenitors also originate from the PCV starting at 28 hpf [76][77][78] , it is unclear if these cells have a unique transcriptomic signature at these early stages. Our data suggest that the venous-2 population is enriched in markers for the caudal vein. The caudal vein is known to have a unique transcriptional profile 84,85 and also serves as the niche for hematopoietic stem cells 86 . Therefore, transcriptional characterization of caudal vein endothelium will help to understand pathways involved in its development.
An identity of cluster #6 (VE-other) is the most nebulous. Many top marker genes for this cluster are poorly characterized. Marker genes with known endothelial specific expression pattern include pan-endothelial kdrl, arterial cldn5b and venous specific lyve1b and dab2 (Table S1). Although the spatial and temporal origin of arteries and veins has been well established, the precise timing when arterial and venous-fated precursor cells differentiate into distinct arterial and venous vessels has been controversial [87][88][89] . We have previously shown coexpression of venous and arterial genes in many arterial progenitors at the 20-somite stage in the trunk region of zebrafish embryos. However, at later stages of development (24 hpf) the expression of venous and arterial markers becomes restricted to veins and arteries, respectively 48 , in the trunk vasculature. In contrast, we show here that arterial and venous genes are co-expressed in a subset of cranial vasculature including PHBC and MCeV at 24  www.nature.com/scientificreports/ hpf as well as 30 hpf. These results suggest mixed arteriovenous identity of cranial vasculature during embryonic development. Cells within cranial vasculature also show strong expression of other cluster #6 markers, such as cox4i2 and krt18, further supporting the argument that cluster #6 corresponds to a subset of cranial vasculature. However, we cannot rule out the possibility that cells with similar gene expression signature are also located elsewhere in the embryo. It is also possible that such expression pattern may represent a transitory phase of differentiation during embryonic development. Endocardial cells share many similarities with vascular endothelial cells. However, they are also known to harbor a unique transcriptomic signature as evident by the expression of nfatc1 64,90 . Differences in the transcriptomes between endocardial and other types of vascular endothelial cells have not been well characterized during embryonic development, and only a limited number of endocardial-specific markers are currently known in any vertebrate model system. We have previously reported a transcriptional signature of zebrafish endocardial progenitors identified during scRNA-seq analysis of etv2 reporter embryos at the 20-somite stage (19 hpf) 48 . In addition, a recent study performed RNA-seq using an endocardial specific reporter line to describe the endocardial transcriptome at the 15-somite stage 66 . Here we performed subclustering based on the established markers gata5 and nfatc1 to define the endocardial transcriptome at 24 hpf. A significant overlap between the endocardial marker genes at 15-somite, 20-somite and 24 hpf stages was observed which confirms the endocardial identity of gata5 + nfatc1 + cells. Multiple genes in this subcluster have been previously reported to show heart-specific expression, although most studies did not distinguish between the endocardial and myocardial expression. We further validated and demonstrated cardiac expression for 3 previously uncharacterized genes from this subcluster. Identification of the endocardial signature will be a useful resource for many investigators involved in cardiovascular research.
In summary, we identified transcriptional signatures of vascular endothelial cells during zebrafish development and described six different subtypes of endothelial-related cells at 24 hpf, when blood circulation is first initiated. This is one of the first reported scRNA-seq datasets of vascular endothelial cells during zebrafish development. We expect that this will be a useful resource for many researchers in cardiovascular development and will promote investigation of molecular mechanisms involved in the establishment of vascular heterogeneity.

Materials and methods
Zebrafish lines and maintenance. All studies were performed following standard guidelines and protocols approved by the University of South Florida and Cincinnati Children's Hospital Medical Center Institutional Animal Care and Use Committee. Embryos were raised at 28.5 °C in embryo medium and staged by hours post fertilization (hpf) 91 . Zebrafish lines used in the study are as follows: wild-type AB, wild-type EKK, Tg(kdrl:GFP) s84392 , Tg(kdrl:mCherry) 14 and Tg(etv2 ci32Gt ; UAS:GFP) 18 .
Embryo dissociation and single-cell RNA-seq. Tg(kdrl:mCherry) fish were crossed to etv2 ci32Gt ; UAS:GFP line. Embryos double positive for both mCherry and GFP fluorescence were selected at 24 hpf. Embryos were dechorionated and immediately transferred to a 1.5 ml Eppendorf tube placed on ice for the generation of single-cell suspension. Whole embryos were then dissociated into a single-cell suspension by cold protease tissue dissociation protocol 93 . Cells expressing GFP only and both GFP and mCherry were sorted using a BD/FACSAria II flow cytometer cell sorter. A suspension of ~ 10,000 single cells was loaded onto the 10XGenomics Single Cell 3' (V3 chemistry) chip at the CCHMC Gene Expression Core. 12 cDNA amplification cycles were used to generate cDNA. Samples were sequenced on an Illumina NovaSeq 6000 instrument (Illumina, San Diego, CA) running an S2 flow cell with parameters as follows: Read1, 28 cycles; Index Read1: 8 cycles; Read 2: 91 cycles at the CCHMC DNA Sequencing core. The raw Fastq files obtained from the sequencing core were then mapped to the Danio rerio genome (version zv11) to generate single-cell feature counts using Cell Ranger version 3.0.2. Counts were performed on fastq data from each of the populations individually. Filtered matrix files were then imported into Partek Flow analysis Suite. Cells were filtered on total read per cell (477-61,000), expressed genes per cell (96-5456) and mitochondrial counts (0-10%) which resulted in 10,818 cells. The gene count CPM (counts per million) values were normalized by adding 1.0 and calculating log 2 values. GFP + and GFP + mCherry + samples were combined, and an unbiased graph-based clustering was performed. Finally, UMAP plots, violin plots, scatter plots, a dot plot, and a heatmap were obtained and exported from Partek. Endocardial cluster was identified based on the expression of marker genes gata5 and nfatc1 (log 2 gata5 and nfatc1 > 0).
WISH and HCR. Whole mount in situ hybridization was performed as described before 94 .
Ethical approval. The study is reported in accordance with ARRIVE guidelines 96 .

Data availability
The scRNA-seq datasets generated and/or analyzed during the current study are available in the NCBI GEO database under accession number GSE202912. All processed image datasets are included in this published article (and its Supplementary information files). Original image datasets generated during this study are available from the Lead contact upon reasonable request.